Goto

Collaborating Authors

 Central North Sea



BEAVER: An Efficient Deterministic LLM Verifier

Suresh, Tarun, Wadhwa, Nalin, Banerjee, Debangshu, Singh, Gagandeep

arXiv.org Artificial Intelligence

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.


Automating the Refinement of Reinforcement Learning Specifications

Ambadkar, Tanmay, Žikelić, Đorđe, Verma, Abhinav

arXiv.org Artificial Intelligence

Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose \textsc{AutoSpec}, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. \textsc{AutoSpec} is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how \textsc{AutoSpec} can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that \textsc{AutoSpec} yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by \textsc{AutoSpec} are utilized.


Space Explanations of Neural Network Classification

Labbaf, Faezeh, Kolárik, Tomáš, Blicha, Martin, Fedyukovich, Grigory, Wand, Michael, Sharygina, Natasha

arXiv.org Artificial Intelligence

Explainability of decision-making AI systems (XAI), and specifically neural networks (NNs), is a key requirement for deploying AI in sensitive areas [18]. A recent trend in explaining NNs is based on formal methods and logic, providing explanations for the decisions of machine learning systems [24, 31, 32, 41, 42, 44] accompanied by provable guarantees regarding their correctness. Yet, rigorous exploration of the continuous feature space requires to estimate decision boundaries with complex shapes. This, however, remains a challenge because existing explanations [24, 31, 32, 41, 42, 44] constrain only individual features and hence fail capturing relationships among the features that are essential to understand the reasons behind the multi-parametrized classification process. We address the need to provide interpretations of NN systems that are as meaningful as possible using a novel concept of Space Explanations, delivered by a flexible symbolic reasoning framework where Craig interpolation [12] is at the heart of the machinery.


Extracting Robust Register Automata from Neural Networks over Data Sequences

Hong, Chih-Duo, Jiang, Hongjian, Lin, Anthony W., Markgraf, Oliver, Parsert, Julian, Tan, Tony

arXiv.org Artificial Intelligence

Automata extraction is a method for synthesising interpretable surrogates for black-box neural models that can be analysed symbolically. Existing techniques assume a finite input alphabet, and thus are not directly applicable to data sequences drawn from continuous domains. We address this challenge with deterministic register automata (DRAs), which extend finite automata with registers that store and compare numeric values. Our main contribution is a framework for robust DRA extraction from black-box models: we develop a polynomial-time robustness checker for DRAs with a fixed number of registers, and combine it with passive and active automata learning algorithms. This combination yields surrogate DRAs with statistical robustness and equivalence guarantees. As a key application, we use the extracted automata to assess the robustness of neural networks: for a given sequence and distance metric, the DRA either certifies local robustness or produces a concrete counterexample. Experiments on recurrent neural networks and transformer architectures show that our framework reliably learns accurate automata and enables principled robustness evaluation. Overall, our results demonstrate that robust DRA extraction effectively bridges neural network interpretability and formal reasoning without requiring white-box access to the underlying network.




Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning

Tudor, Alexis R., Zeng, Yankai, Wang, Huaduo, Arias, Joaquin, Gupta, Gopal

arXiv.org Artificial Intelligence

Current advances in AI and its applicability have highlighted the need to ensure its trustworthiness for legal, ethical, and even commercial reasons. Sub-symbolic machine learning algorithms, such as the LLMs, simulate reasoning but hallucinate and their decisions cannot be explained or audited (crucial aspects for trustworthiness). On the other hand, rule-based reasoners, such as Cyc, are able to provide the chain of reasoning steps but are complex and use a large number of reasoners. We propose a middle ground using s(CASP), a goal-directed constraint-based answer set programming reasoner that employs a small number of mechanisms to emulate reliable and explainable human-style commonsense reasoning. In this paper, we explain how s(CASP) supports the 16 desiderata for trustworthy AI introduced by Doug Lenat and Gary Marcus (2023), and two additional ones: inconsistency detection and the assumption of alternative worlds. To illustrate the feasibility and synergies of s(CASP), we present a range of diverse applications, including a conversational chatbot and a virtually embodied reasoner.


Follow the STARs: Dynamic $ω$-Regular Shielding of Learned Policies

Anand, Ashwani, Nayak, Satya Prakash, Raha, Ritam, Schmuck, Anne-Kathrin

arXiv.org Artificial Intelligence

This paper presents a novel dynamic post-shielding framework that enforces the full class of $ω$-regular correctness properties over pre-computed probabilistic policies. This constitutes a paradigm shift from the predominant setting of safety-shielding -- i.e., ensuring that nothing bad ever happens -- to a shielding process that additionally enforces liveness -- i.e., ensures that something good eventually happens. At the core, our method uses Strategy-Template-based Adaptive Runtime Shields (STARs), which leverage permissive strategy templates to enable post-shielding with minimal interference. As its main feature, STARs introduce a mechanism to dynamically control interference, allowing a tunable enforcement parameter to balance formal obligations and task-specific behavior at runtime. This allows to trigger more aggressive enforcement when needed, while allowing for optimized policy choices otherwise. In addition, STARs support runtime adaptation to changing specifications or actuator failures, making them especially suited for cyber-physical applications. We evaluate STARs on a mobile robot benchmark to demonstrate their controllable interference when enforcing (incrementally updated) $ω$-regular correctness properties over learned probabilistic policies.